Class 10

DATA1220-55, Fall 2024

Sarah E. Grabinski

2024-09-20

Homework 2

Homework Hints

  • Read the instructions! Some of the issues you’re having are because you did not follow them correctly.

  • Please answer in complete sentences where possible! I want you to practice effectively communicating data, and life is not a multiple choice question. I will be more clear about indicating this on future homework.

  • Real world distributions are harder to describe than idealized theoretical distributions. Combining visual and numeric summaries is more powerful than using either alone.

Campuswire Hints

  • Turn on notifications. Your question may have already been asked and answered. Campuswire can email you when there are new posts, so you can keep up with the discussion.

  • Be specific! A detailed question is more likely to get a (useful) answer than a general question.

  • Include code & error messages. It is much easier to troubleshoot “My document won’t render. I’ve copy-pasted the error message and the lines of code where it breaks.” than “My document won’t render.” Click here for more info on how to ask good debugging questions.

How can I get help with homework?

  • Read the textbook. Many of you are asking for additional examples. Luckily, there are tons we didn’t go over in the textbook.

  • Ask a question on our Campuswire class feed. I’m only one person, and I may not be able to give you a prompt answer. However, the 20+ other people in the class might be able to.

I will try to keep an eye on Campuswire posts between 4-6pm before the homework is due, but I have other things going on and might miss something.

Last time… defining probability

  • Probability: The proportion of times that a particular outcome would occur if we observed a random process an infinite number of times (\(\operatorname{P}(\operatorname{Event = A})\).

    • Ranges from 0 to 1 or 0% to 100%

    • \(0 \le \operatorname{probability} \le 1\)

    • Probability = Proportion

  • Random process: you know which outcomes are possible (i.e. the sample space) but you don’t know which outcome comes next

Last time… representing probability

  • Sample space: all possible outcomes of a random process (\(S\))

  • Disjoint events: events that CANNOT occur at the same time (mutually exclusive)

  • Complement: the complement of any event \(A\) which exists in sample space \(S\) is any outcome also in sample space \(S\) which is NOT \(A\) (\(A^C\) or \(A'\))

    • Complements are always disjoint

    • The probability of event A occuring OR the complement of event A occuring is always 1

  • Non-disjoint events: events that CAN occur at the same time

Last time… calculating probabilities

Remember…

  • \(\operatorname{P}(S)=1\)

  • \(\operatorname{P}(S)=\operatorname{P}(A)+\operatorname{P}(A')\)

  • \(\operatorname{P}(A)+\operatorname{P}(A')=1\)

  • \(\operatorname{P}(A')=1-\operatorname{P}(A)\)

Last time… population probability

  • Population Probability: the theoretical “true” probability of an outcome in the population of interest, the “ground truth” (\(p\))

\[ p=\frac{\operatorname{count}(\operatorname{events = A})}{\operatorname{count}(\operatorname{all events in sample space})} \]

Last time… sample probability

  • Sample Probability: the probability of an outcome observed in a sample of size \(n\) from a population with probability \(p\), an estimate of the population probability (\(\hat{p}_n\))

\[ \hat{p}_n=\frac{\operatorname{count}(\operatorname{observation = A})}{\operatorname{count}(\operatorname{observations in sample})} \]

Last time… Law of Large Numbers

  • How well the sample proportion \(\hat{p}_n\) represents the population proportion \(p\) depends on the size of the denominator.

  • As more observations are collected, the sample proportion \(\hat{p}_n\) of a particular outcome approaches the population proportion \(p\) of that outcome.

  • \(\lim_{n\to\infty} \hat{p}_n = p\) (As \(n \to \infty\), \(\hat{p}_n \to p\))

Last time… sampling WITH replacement

Sampling with replacement is like drawing a card from a deck, then shuffling it back in before drawing another card. Repetition is possible.

This time…

  • Sampling WITHOUT replacement

  • Independent and dependent processes

  • Calculating probabilities for 2 events

    • (General) Addition Rule

    • (General) Multiplication Rule

  • Probability distributions

Returning to the playlist shuffle example…

  • Extracted song names and artists from the “Taylor Swift Radio” playlist on Spotify

  • There are 50 songs on the playlist by 26 different artists.

  • Some artists have more than 1 song on the playlist, and 1 song was a collaboration between 2 artists.

  • The iPod Shuffle originally used random sampling WITH replacement to select the next song to play (“true” shuffle)

  • Spotify originally used random sampling WITHOUT replacement to select the next song to play (Fisher-Yates Algorithm)

Let’s define our sample space \(S\) again…

See board…

Population proportions by song for sampling WITH replacement

Population proportions by artist for sampling WITH replacement

Drawing the sample space: 1 outcome

See board…

Calculating probability for a single event

\[ \begin{aligned} \operatorname{P}(\operatorname{Next Song by Chappell Roan})&=\frac{\operatorname{count}(\operatorname{Songs by Chappell Roan})}{\operatorname{count}(\operatorname{All Possible Songs})} \\ &= \frac{7}{50} \\ &= 0.14 \end{aligned} \]

Describing the sample space: 2 disjoint outcomes

See board…

Addition Rule for Disjoint Outcomes

Describes the probability of event A or event B occurring (\(P(A \cup B)\))

\[\operatorname{P}(\operatorname{A or B}) = \operatorname{P}(\operatorname{A}) \times \operatorname{P}(\operatorname{B})\]

Example: Addition Rule for Disjoint Outcomes

See board…

General Addition Rule for Non-Disjoint Outcomes

Describes the probability of event A or event B occurring

\[\operatorname{P}(\operatorname{A or B}) = \operatorname{P}(\operatorname{A}) \times \operatorname{P}(\operatorname{B}) - \operatorname{P}(\operatorname{A and B})\]

Example: General Addition Rule for Non-Disjoint Outcomes

See board…

Independent Processes

  • Two random processes are independent if the outcome of process A provides no information about process B

    • You roll a die once and get a 3. You still don’t know what number you’ll roll next.

    • You aren’t more likely to land on heads when flipping a coin just because you also got heads on your last one.

Example: Independent Processes with “True” Shuffle

What if we listened to 2 songs in a row using “true” shuffle (i.e. sampling with replacement)?

Each song goes back into the “pot” after it is played and can be repeated.

What happens to the sample space after the first song?

The sample space does not change between events.

Defining the Sample Space

See board…

What is the probability we hear a song by Chappell Roan twice in a row?

  • 7 opportunities for song 1 to be by Chappell Roan

  • 7 opportunities for song 2 to be by chappell roan

\(7 \times 7 = 49 \operatorname{possibilities!}\)

  • 50 possible songs for song 1

  • 50 possible songs for song 2

\(50 \times 50 = 2500 \operatorname{possibilities!}\)

\(\operatorname{P}(\operatorname{A and B})=\frac{7 \times 7}{50 \times 50}\)

Multiplication Rule for Independent Processes

Describes the probability of event A and event B occurring

\[ \operatorname{P}(\operatorname{A and B})=\operatorname{P}(\operatorname{A}) \times \operatorname{P}(\operatorname{B}) \]

Dependent Processes

  • Two random processes are dependent if the probability of process B changes based on the outcome of process A

    • You’re playing poker and cards are about to be dealt. The probability that you will receive an Ace changes as each card is distributed.

    • The chances that you’ll bring an umbrella with you when you leave the house changes depending on whether or not its raining.

Sampling WITHOUT Replacement

Sampling without replacement is like drawing a card from a deck, then drawing another card without putting the first one back. Repetition is NOT possible.

Example: Dependent Processes with Spotify Shuffle

What if we listened to 2 songs in a row using Spotify’s shuffle (i.e. sampling without replacement)

Song does NOT go back into the “pot” after it is played and CANNOT be repeated.

The sample space does change between events.

Defining the Sample Space for Dependent Processes

See board…

What is the probability we hear a song by Chappell Roan twice in a row?

  • 7 opportunities for song 1 to be by Chappell Roan

  • 6 opportunities for song 2 to be by chappell roan

\(7 \times 6 = 42 \operatorname{possibilities!}\)

  • 50 possible songs for song 1

  • 49 possible songs for song 2

\(50 \times 49 = 2450 \operatorname{possibilities!}\)

\(\operatorname{P}(\operatorname{A and B})=\frac{7 \times 6}{50 \times 49}\)

General Multiplication Rule for Dependent Processes

Describes the probability of event B occurring given that event A has already occurred

\[ \operatorname{P}(\operatorname{A and B})=\operatorname{P}(\operatorname{A}) \times \operatorname{P}(\operatorname{B given A}) \] ## Distinguishing Independent and Dependent Processes

When processes are independent, \(\operatorname{P}(\operatorname{A and B})=\operatorname{P}(\operatorname{A}) \times \operatorname{P}(\operatorname{B})\).

If \(\operatorname{P}(\operatorname{A and B})\ne\operatorname{P}(\operatorname{A}) \times \operatorname{P}(\operatorname{B})\), the processes are NOT independent.

Probability Density Functions

  • Probability Mass Function: categorical

  • Probabiity Density Function: numerical